Symptoms of anal incontinence and quality of life: a psychometric study of the Norwegian version of the ICIQ-B amongst hospital outpatients

Background The International Consultation on Incontinence Questionnaire-Bowel (ICIQ-B), a self-report, condition-specific questionnaire designed to assess symptoms of anal incontinence (AI), measures AI’s impact on quality of life (QoL) along with perceived bowel patterns and bowel control amongst individuals with AI. In our study, we aimed to translate the ICIQ-B to Norwegian and investigate the Norwegian version’s psychometric properties. Methods To establish a relevant, comprehensive, and understandable Norwegian ICIQ-B, cognitive interviews were conducted with 10 patients with AI, and six clinical experts reviewed the translated scale. The Norwegian ICIQ-B’s structural validity, scale reliability, and content validity were tested amongst patients with AI attending hospital outpatient clinics in three regions of Norway (N = 208). Results Assessing the Norwegian ICIQ-B’s content validity revealed that the questionnaire was relevant, comprehensive, and understandable. Missing data were infrequent (3.3%), and no floor or ceiling effects emerged. Three-factor and two-factor solution models, both with advantages and disadvantages, were found. The three-factor model offered the most parsimonious solution by covering most of the original scale, albeit with an unacceptably low reliability (α = .37) for the construct of bowel pattern. The two-factor model showed good reliability in terms of internal consistency for the constructs of bowel control (α = .80) and impact on QoL (α = .85) but was less parsimonious due to dismissing seven of the original 17 items and excluding the bowel pattern construct. Test–retest reliability demonstrates good stability for the Norwegian version, with an intra-class correlation coefficient of .90–.95 and weighted kappa of .39–.87 for single items. Conclusions Although the Norwegian version of ICIQ-B demonstrates good stability and content validity, the original constructs of bowel pattern and bowel control had to be adapted, whereas the construct of impact on QoL remained unchanged. Further psychometric testing of the Norwegian ICIQ-B’s factor structure is therefore recommended. Supplementary Information The online version contains supplementary material available at 10.1186/s13690-022-01004-z.


Introduction
Anal incontinence (AI) is a debilitating condition that impacts an individual's self-esteem and quality of life (QoL) and may cause significant secondary morbidity, disability, and economic burden [1]. In contrast to faecal incontinence (FI), defined as "the involuntary loss of liquid or solid stool that is a social or hygienic problem, " AI entails the involuntary loss of not only stool but also flatus from the rectum due to the inability to control bowel movements [2]. Thus, AI ranges from the occasional leakage of stool while passing gas to a complete loss of bowel control [3]. Because AI encompasses the loss of flatus and stool, its estimated pooled prevalence rate amongst home-dwelling adults is 15-17%, whereas FI's is only 5.9% [4]. A population-based cross-sectional study among Norwegian women aged 30 and older found that 19.1% of the women reported AI, while 3.0% reported FI [5]. No studies have reported AI or FI prevalence among Norwegian home-dwelling men. However, because many patients avoid reporting FI, its prevalence may be underestimated [6,4]. The highest prevalence is found among older people residing in care homes with an estimated FI prevalence of 42.8% [7].
The aetiology of AI is complex and multifactorial. Continence depends on the interaction between the anal sphincter complex, stool consistency, rectal reservoir function and neurological function. Disease processes or structural defects that alter any of those components can lead to FI [8]. Diarrhoea and altered bowel habits, inflammatory bowel disease, diet intolerance and constipation with paradoxical diarrhoea represent the most frequent independent risk factors for AI [9]. The most common structural causes, however, result from obstetrical injury [10], anorectal surgeries [11] and rectal prolapse [12,13]. Depending on the presenting circumstances, FI is commonly classified as passive incontinence (i.e. involuntary discharge without any awareness), urge incontinence (i.e. discharge despite active attempts to retain it), and faecal seepage (i.e. leakage of stool with grossly normal continence and evacuation) [14, p. 1585].
Due to AI's complex aetiology, treatment needs to be tailored to the individual's circumstances [15]. Although several scoring systems are commonly used to assess AI, no investigative tools specifically link symptoms of AI to QoL [8]. For clinicians as well as researchers, validated questionnaires and scales play an integral role in identifying symptoms of a disease, assessing patients' QoL, and objectively characterising any phenomenon detected [16]. Amongst such instruments, the International Consultation on Incontinence Questionnaire-Bowel (ICIQ-B) is a self-report, condition-specific questionnaire designed to assess symptoms of AI and its impact on QoL [17,18]. As part of the International Consultation on Incontinence's suite of validated questionnaires on incontinence [19], the ICIQ-B includes 21 main items, 17 of which address three scored factors: Bowel Pattern, Bowel Control, and Impact on QoL. In addition, to evaluate important issues from the perspectives of clinicians and patients, the ICIQ-B includes four unscored items: one representing the Bristol Stool Chart of stool consistency [20] and three others respectively concerning strain, worry and the restriction of sexual activities due to AI. Tailored for use by clinicians in both primary and secondary healthcare, the ICIQ-B is designed to screen for AI, obtain a brief yet comprehensive summary of the level, impact, and perceived cause of symptoms of AI and to facilitate better patient-clinician discussions [17,18]. The ICIQ-B is intended for both clinical assessment and research. The 21 items are therefore divided in two parts; an A-question representing the main issue, accompanied by a B-question "how much does this bother you?" which is particularly important in a clinical perspective. The A-questions are measured on a 5-or 6-point Likert scale, while the B-questions are measured on a scale from (0 not at all) -10 (a great deal). One item, item 3, has a third question, since the main question regarding frequency of opening one's bowels is further divided into a) usual and b) at worst and c) how much does this bother you? (Additional file 1).
Validated patient-reported outcome measures not only help patients and clinicians to make better decisions but also enable comparisons of providers' performance to stimulate improvements in services. They are also well-suited for cross-national comparisons of research [21,22]. To date, the ICIQ-B, originally developed in British English [17,18], has been translated and validated in Spanish (i.e. in Chile), albeit only regarding content validity based on cognitive interviews [23]. Although an American English online version of the ICIQ-B has been psychometrically evaluated against an American English paper version [24], the extent of testing was limited. Even so, both cited studies involved assessing the test-retest reliability, which proved to be good in both cases [23,24]. Moreover, the psychometric evaluation conducted in the United States demonstrated the ICIQ-B's convergent validity and reasonable response to change at follow-up 3 months after the non-surgical treatment of FI, as well as its good internal consistency for the constructs of impact on QoL and bowel control. Meanwhile, having tested the American English version of the ICIQ-B, Markland et al. [24] demonstrated its fair internal consistency for the construct of bowel pattern. However, neither the Spanish nor the American English translation of the ICIQ-B has been assessed for structural validity. Beyond that, a review of QoL measures in relation to FI has shown that the original British English version of the ICIQ-B lacks sufficient structural validity [25]. Thus, because the ICIQ-B's factor structure seems to be unclear, we evaluated the structural validity, reliability, and content validity of a Norwegian version of the scale. The research question was addressed in accordance with the COnsensus-based Standards for the selection of health Measurement Instruments (COSMIN guidelines) [26,27], which address evidence related to structural validity, reliability, and content validity, all as central, interrelated properties of a given measurement model. Whereas structural validity (i.e. dimensionality) concerns the homogeneity of items [28]-that is, whether items match their respective constructs-reliability encompasses a scale's inconsistency and lack of error [28]. By further contrast, content validity explores whether the theoretical content of constructs is adequately represented by questionnaire items in terms of relevance and comprehensiveness [29].

Translation and cultural adaptation
First, the ICIQ-B was translated from British English to Norwegian by a bilingual Norwegian-English translator, followed by a back-translation into English conducted by another bilingual Norwegian-English translator [19]. Second, the back-translation was evaluated by the International Consultation on Incontinence Questionnaire group [30]-that is, the British English instrument's developers-who provided useful comments regarding possible ambiguities and other flaws that guided minor adjustments to the Norwegian ICIQ-B. Third, the Norwegian version was pilot-tested for comprehensiveness, readability, and equivalence [29] in cognitive interviews with 10 patients with AI living in Norway. Fourth, comments were gathered from six Norwegian bi-or monolingual multidisciplinary clinical experts to further assess comprehensiveness, readability, and equivalence. As minor discrepancies were identified and amended between each step, a comprehensible Norwegian version of the ICIQ-B gradually emerged (see Fig. 1).

Participants and sampling procedure
In our study, three samples were recruited. The first was a sample of 10 patients, both men and women, recruited from the outpatient gastrointestinal surgery clinic of St. Olav's University Hospital in Trondheim to participate in cognitive interviews. Patients with AI were invited to participate in the interview study by a nurse contact who provided them with information about the study, after which patients could contact the researcher directly. Written consent was obtained from the patients before their interviews commenced. Second, a sample of six clinical experts in AI (i.e. colorectal surgeons, stoma nurses and physiotherapists) from the three participating hospitals (i.e. St. Olav's University Hospital in Trondheim, University Hospital Northern Norway in Tromsø and Akershus University Hospital in Oslo) were recruited to evaluate the Norwegian ICIQ-B's comprehensiveness, relevance, and wording. The clinical experts were sent the Norwegian and original British English versions of the questionnaire via email, and the research team received their feedback either by email or orally during in-person meetings, depending on each expert's preference. The cognitive interviews with patients and the evaluation by clinical experts were both part of pilot-testing the translated Norwegian version of the questionnaire and served to establish the foundation for the content validity and cultural equivalence between the Norwegian and British English versions of the ICIQ-B.
Third, to test the psychometric properties of the Norwegian ICIQ-B, patients referred from their general practitioners to outpatient clinics in the three mentioned university hospitals due to AI were recruited to complete a paper-based questionnaire. The three hospitals represent three regions of Norway from south to north. To be included, new patients had to be attending the outpatient clinic due to AI, had to have never received treatment for AI and had to be able to provide their written consent to participate in the study and to complete the questionnaire independently. Patients who participated in the cognitive interviews were not enrolled in that subsequent part of the study. A patient sample 10 times the number of items was needed to be able to perform a factor analysis of the Norwegian ICIQ-B [31]. Because the original questionnaire consists of 21 items, four of which are unscored and were excluded from our analysis, and because the remaining 17 items implied a sample size of approximately 170 patients, we aimed to include at least 200 questionnaire respondents.
Eligible patients were invited to participate by using the hospitals' routines to summon patients. Along with an invitation to a medical consultation at the hospital, eligible patients received an information sheet about the study together with the questionnaire and a return envelope. Patients who attended the consultation subsequently received another invitation to participate in the study, which both reminded patients who had not yet responded to the questionnaire and served as a retest for those who had already returned their responses. Patients were recruited beginning in 2011 until 200 had been enrolled (i.e. in 2013).
Structural validity was assessed of the main A-questions using CFA and EFA (Principal Axis Factoring). In our study, the model fit was assessed by χ 2 statistics and two conventional fit indices-the root-mean-square error of approximation (RMSEA) and the standardised root mean square residual (SRMS)-with values less than .05 indicating a good fit and values from .05 to .10 indicating an acceptable fit [33,34]. Furthermore, the comparative fit index (CFI) and the Tucker-Lewis index (TLI) with acceptable fit set at .95 and good fit at .97 were used [33][34][35][36]. Because skewness and kurtosis were significant, the Satorra-Bentler-corrected χ 2 was applied as recommended when analysing non-normal continuous endogenous variables [37]. EFA was performed with oblim rotation, and observations with one or more missing values across the 17 variables included any of the three constructs were deleted. No replacements were made for missing data.
Next, content validity was assessed in three ways. First, cognitive interviews with patients in the target population and reviews of the scale by clinical experts were analysed on a question-by-question basis and any  [38]. Second, floor and ceiling effects were considered problematic if more than 15% of respondents achieved the highest-or lowest-possible score [39,40]. Third, at the item level, less than 3% missing data was acceptable, whereas more than 15% was not [29].
The reliability of the questionnaire and its subscales were assessed for their internal consistency and stability over time. To assess the internal consistency of the A-items, we used the reliability coefficients of Cronbach's alpha (α) and composite reliability (ρ c ), with values ≥.7 considered to be good [29]. Test-retest reliability was evaluated using intra-class correlation coefficients (ICC) to measure the stability of scales over time and weighted kappa values with linear weights for single items [40,39]. In ICC analysis, a two-way mixedeffect Analysis of Variance (ANOVA) was used because time is a relevant factor in test-retest studies of patientreported outcome measures. Also, interaction for the absolute agreement between scores was considered the preferred ICC formula [41]. Additionally, measurement error (i.e., standard error of measurement and smallest detectable change) were reported [40].

Ethical considerations
The Regional Committee for Medical and Health Research Ethics reviewed and approved the study (2009/1225), as did the institutional review board at the three university hospital clinics. Each patient was informed about the study and signed a written declaration of consent to participate. Participants were informed that their participation in the study was voluntary and that they could withdraw their consent at any given time and for any or no reason.

Results
During the 2-year period of data collection, 360 invitations for participation were sent to eligible patients. At baseline, 208 Norwegian patients with AI completed the questionnaire (57.8% response rate), 50 of whom completed it again after 1-6 weeks (i.e., retest). Observations with one or more missing values across the 17 variables included in any of the three factors were deleted, which left a sample of 161.
At baseline, most respondents were women (87.3%). The age range was 18-89 years (Mean 59.2, SD = 15.0), as shown in Table 1. Scale scores for the original constructs appear in Table 2.

Exploratory factor analysis (EFA)
To explain as much of the total variance as possible with as few factors as possible, we subjected the ICIQ-B to EFA. The Kaiser-Meyer-Olkin measure of sampling adequacy, .88, exceeded the recommended value of .60, and Bartlett's test of sphericity showed statistical significance (p < .0001), which supported the factorability of the correlation matrix. A factor loading of .32 indicates approximately 10% overlapping variance with the factor's other items; thus, a minimum loading of .32 is considered acceptable [42]. Accordingly, a cross-loading item would load at .32 or higher on two or more factors. When subjecting the ICIQ-B to EFA, we sought the cleanest factor structure. Because the original ICIQ-B contains three factors, we expected a three-dimensional structure with correlated factors.
Five factors with eigenvalues greater than or equal to 1.0 were extracted (see Table 3), with factor loadings of .38-.94. Figure 2 shows the scree-test of the ICIQ-B data, with five factors explaining 68.17% of the variance; Factor 1 explained 38.37%, Factor 2 explained 9.11%, Factor 3 explained 8.21%, Factor 4 explained 6.37%, and Factor 5 explained 6.11%. That EFA-suggested solution revealed five factors with two to five items each. Four of the factors displayed good or acceptable Cronbach's alpha coefficients between .64 and .85, whereas the other had a poor one (α = .55). Table 3 lists the loadings and variance for that rotated five-factor solution of the ICIQ-B. Commonalities for the 17 items ranged between .25 for Item 7 and .86 for Item 21, for which a value greater than .40 is recommended [43].

Confirmatory factor analysis (CFA)
First, we tested the original three-dimensional structure involving 17 items following Cotterill et al. [18].  was good, the other fit indices indicated misspecification. Reliability assessed with the composite reliability coefficient (ρ c ) was good for two of the three dimensions (see Table 4). Analysing the residuals and modification indices (MI) revealed no significant residuals, but 10 MIs were greater than 10; the pairs of Thus far, we had dismissed Items 13 and 4. Nevertheless, though the χ 2 /df was good, the fit remained poor, and five MIs greater than 10 were present. Items 8 and 12 had an MI of 19.43; Item 12-"Are you able to control mucus (discharge) leaking from your back passage?"-shared a considerable amount of error variance with Item 8 (i.e., "Do you experience any staining of your underwear or need to wear pads because of     you open your bowels in 24 hours?"), Item 6 (i.e. "Do you use medications such as tablets or liquids to stop your bowels from opening?") and Item 7 (i.e. "Do you experience pain/soreness around your back passage?")-had a low composite reliability (ρ pattern = .51). However, the reliability was good for the other two factors (ρ control = .82 and ρ QoL = .89). Examining the theoretical content of the items belonging to the Bowel Pattern factor clarified that they address different aspects, which explains their low internal consistency. Those items include aspects ranging from the frequency of opening one's bowels to using medication and experiencing pain. Apparently, the items neither shared much variance nor seemed to represent reliable indicators for the same construct.
Thus, Model 8, with two factors (i.e. Bowel Control and Impact on QoL) and 10 items, was less parsimonious but demonstrated the statistically best fit. By comparison, Model 6 also included those factors along with the Bowel Pattern factor, with 13 items, and was therefore the most parsimonious measurement model with a good fit. Models 6 and 8 are illustrated in Figs. 3 and 4, respectively.

Content validity and scale reliability
Cognitive interviews with patients with AI and evaluations of the Norwegian ICIQ-B version by clinical experts indicated the Norwegian ICIQ-B's good face and content validity in terms of relevance, comprehensiveness, readability, and equivalence. The overall percentage of missing data at baseline was 3.3%, ranging from 0.5 to 11.1% for single items. Because none of the proposed scales had the lowest-or highest-possible score with more than 15% frequency, no floor or ceiling effects were found in the total score distributions.
Regarding reliability, internal consistency in the proposed factors showed Cronbach's alphas (α) from .37 to .85, as presented in Table 6. Test-retest stability revealed ICCs between .90 and .94. Concerning the stability of single items, 13 items had weighted kappa values of .61-.80, whereas five had values of .41-.60, two of .81-1.00 and one of .21-.40. The factors' standard error of measurement error (SEM), an expression of the average measurement error, was estimated to be 0.42-0.73 points, while the smallest detectable change (SDC 95 ), indicating the uncertainty of that average, was 1.16-2.02 points [29]. Although the SEM was 0.73 for the factor Impact on QoL, to be 95% certain that a change beyond the measurement error has occurred, the patient's score has to change by 2.02 points from test to retest.

Discussion
The original ICIQ-B includes 17 items representing three factors (i.e. Bowel Pattern, Bowel Control, and Impact on QoL), along with four unscored items. In our study, we translated the ICIQ-B scale into Norwegian and tested its psychometric properties (i.e. structural validity, reliability, and content validity) amongst adults in Norway.

Structural validity
When evaluating a measurement scale's structural validity, two aspects are vital: the data's underlying dimensionality (i.e., not too many or too few factors) and the adequacy of the scale's individual items [29]. Showing eigenvalues exceeding 1.0, our EFA suggested five factors: two substantial factors with three and five items and three weak factors with three or two items each. The EFA also revealed cross-loadings, and because the original ICIQ-B contains only three factors [18], its dimensionality seemed uncertain. However, because conclusions should not be drawn solely based on EFA, we conducted a CFA, which revealed both a three-factor solution and a two-factor measurement model showing good fit. However, several items seemed to indicate misspecification. Both reliability and structural validity relate to the adequacy of a scale's items. Good indicators of a factor show highly significant factor loadings, preferably greater than .70, accompanied by strong squared multiple correlations (R 2 ), which represent how much variation in an item is explained by the latent construct [44]. In our study, all loadings were significant at the 1% level except Item 7. Regarding Model 6, Table 5 shows that seven factor loadings were excellent (>.70), four were good to fair (.55-.45), and two, for Items 7 and 8, were very low (<.45) and hardly explained any variance in the respective construct [28]. Thus, 11 items were rated as reliable indicators, whereas Items 7 and 8 displayed poor reliability. For Model 6, the factors of Bowel Control and Impact on QoL had good alpha values and composite reliability(ρ c ), whereas the Bowel Pattern factor demonstrated low internal consistency (ρ c = .51) and thus low reliability [45,33]. Accordingly, the dimensionality seemed imprecise, as further pinpointed by Item 5's far stronger loading on another factor than originally determined. Allowing Item 5 (i.e. "Do you have to rush to the toilet when you need to open your bowels?") relate to the Bowel Control factor instead of the Bowel Pattern factor (Model 6) improved the model fit considerably. Based on the low reliability of the Bowel Pattern factor, we tested a two-factor solution excluding that factor. That solution (i.e. Model 8) showed good reliability, with highly significant factor loadings, good reliability coefficients and a nearly acceptable fit. Looking at the two-factor model including Item 5 in the Bowel Control factor, the solution had good reliability and clear dimensionality. However, to achieve a good model fit, some items had to be removed, namely Items 12-14, all of which solicit information about bowel accidents. In our study, respondents seemed to assume those three items sought to assess roughly the same thing, which generated substantial correlated error variance that again hampered the model fit.
In our investigation, the original three-factor structure with 17 items did not fit well with the data. Model 6, including three factors and 13 items, was the most parsimonious model with a good fit, whereas Model 8, including two factors with 10 items, was less parsimonious but demonstrated a statistical better fit. Both models contained identical versions of the factors of Bowel Control and Impact on QoL and differed only considering Model 6's inclusion of the third factor, Bowel Pattern.

Content validity
To gauge the translated scale's relevance, comprehensiveness and comprehensibility, cognitive interviews with patients from the target group and evaluations made by a multidisciplinary group of clinical experts deemed that the content and wording of the Norwegian ICIQ-B's items corresponded well with the constructs intended to be measured-that is, the items captured AI's complexity [29]. However, the items did not fit well into the three constructs, especially for the construct of bowel pattern, in which the items were overly broad and caused insufficient internal consistency, as also seen in the original British English version, the Spanish version and the American English version [18,23,24]. Moreover, the original ICIQ-B includes four unscored items not encompassed within the original dimensionality. The four items (i.e. Items 4 and 12-14) removed from the Norwegian scale, however, could be placed together with those four unscored items, which would support the Norwegian ICIQ-B's clinical relevance.
The Norwegian ICIQ-B with the adapted factor structure demonstrated promising psychometric properties.
The level of missing items in the questionnaire was acceptable, which confirmed that that the items were relevant, straightforward, and meaningful to the respondents. One item had more than 3% missing data, namely Item 18 (i.e. "Do you restrict your sexual activities because of your bowels?"), with 11% missing data. That outcome is unsurprising, because sexuality may be a sensitive topic or even be perceived as irrelevant. The absence of floor and ceiling effects demonstrated that the scale could produce a good distribution of responses to a given item and that scores at the scale's upper or lower levels show no clustering or skewness. That measurement property is also important regarding the questionnaire's discriminative power. For example, a maximum score would preclude recognising any potential improvement to the questionnaire following any type of intervention.

Scale reliability
Testing the Norwegian ICIQ-B demonstrated its good reliability in terms of internal consistency and excellent stability. While the Bowel Control factor had an acceptable Cronbach's alpha, Impact on QoL factor had a good one. However, for the Bowel Pattern factor (α = .37), the reliability coefficient was unacceptably low (>.5) [46]. The poor reliability of the Bowel Pattern factor has previously been identified, including in the initial study by the scale's developers [24,18]. Consistent with the American English and Spanish versions of the ICIQ-B and the initial study performed by the developers [24,23,18], stability over time was excellent for all three constructs [47]. Furthermore, the Norwegian ICIQ-B demonstrated stability for 13 single items with largely substantial weighted kappa values, two with nearly perfect values, five with moderate values and one with a fair value [48]. The good test-retest reliability of an instrument ensures that measurements obtained are both representative and stable over time [29].

Limitations
A major strength of our study was the rigorous methodology employed in translating and validating the Norwegian ICIQ-B following COSMIN guidelines [26]. However, some limitations should be noted. First, the sample size of 208 was scaled down to 161 due to missing data. The response rate was nevertheless sufficient to perform the analysis. Second, this study employed a rather wide time frame between test and retest with a risk for recall bias and changes in the respondent's health status. Finally, it is worth noting that a good model fit does not guarantee that we have obtained 'the true model'; other alternative models might fit the data equally well as the model found [49].